home *** CD-ROM | disk | FTP | other *** search
- ON ECHO
-
- !*******************************************************************!
- ! !
- ! Cross-section Techniques in SORITEC Sampler !
- ! (Chapter 8) !
- ! !
- !*******************************************************************!
- !
- ! SORITEC Sampler provides two commands for analyzing cross-section
- ! data.
- !
- ! SYNOPSIS produces a detailed summary analysis of data series
- ! including mean, standard deviation, median, quartiles, deciles,
- ! variance, skewness, kurtosis, coefficient of variation
- ! number of observations, number of missing values, minimum,
- ! range, mode and frequency of the mode.
- !
- ! XTAB calculates a standard row * column crosstabulation report.
- ! Crosstabulation output can be controlled by the interactive
- ! print server if the ON CRT option is enabled.
- !
- ! These commands are demonstrated using sample data drawn from
- ! Berenson, M.L., D.M. Levine and M. Goldstein, Intermediate
- ! Statistical Methods and Applications: A Computer Package
- ! Approach. Englewood Cliffs: Prentice-Hall, 1983, p. 30. The
- ! data represent raw data derived from an hypothetical employee
- ! questionnaire and were used to demonstrate features of various
- ! commercial statistical packages. The data base contains the
- ! sex, age, education, employment tenure, job class and weekly
- ! salary of 46 respondents.
- !
- ! Read in the sample data.
- !
-
- USE 1 46
-
- READ sex age education months_employed job_classification salary
- 1 40 3 207 2 458
- 0 21 2 25 1 235
- 1 49 2 390 3 798
- 1 45 3 34 2 339
- 1 25 5 20 2 339
- 1 47 4 209 3 584
- 1 22 3 15 2 296
- 1 22 3 12 2 235
- 1 28 4 70 2 235
- 1 57 3 475 3 571
- 0 20 2 31 2 283
- 1 50 3 364 2 403
- 0 51 3 31 1 363
- 1 62 2 416 2 436
- 1 32 2 129 2 435
- 1 51 4 42 1 334
- 1 55 4 89 2 455
- 0 45 2 274 1 293
- 1 31 5 15 1 228
- 1 62 2 372 2 379
- 1 54 2 83 2 339
- 0 42 2 98 2 270
- 1 26 4 4 2 303
- 1 38 4 43 2 340
- 1 32 2 85 2 359
- 1 45 4 26 2 334
- 1 28 3 99 2 314
- 1 42 4 8 2 340
- 1 50 2 82 2 373
- 1 44 4 211 3 581
- 1 35 4 137 2 451
- 1 68 2 128 2 323
- 1 40 5 18 2 345
- 1 24 5 5 2 333
- 0 27 4 21 1 296
- 1 46 5 87 2 363
- 0 63 3 102 2 325
- 1 28 5 60 2 363
- 1 64 1 338 2 323
- 1 51 2 351 2 323
- 1 47 1 101 2 323
- 0 22 3 28 1 256
- 1 51 5 317 3 515
- 1 29 5 31 3 411
- 1 48 4 316 3 544
- 1 59 5 293 3 450 ;
- END
-
- cls
- !*******************************************************************!
- ! !
- ! Summary Statistics of Cross-section Data !
- ! (Section 8.1) !
- ! !
- !*******************************************************************!
- !
- ! A summary analysis of any variable is produced by the SYNOPSIS
- ! command, i.e.,
- !
-
- SYNOPSIS age months_employed salary
-
- !
- ! All summary statistics produced at the terminal are stored as
- ! internal SORITEC variables and can be retrieved explicitly by
- ! the RECOVER command or referenced implicitly in command lines
- ! by prefixing the internal variable name with an up-carat(^).
- ! Internal variable names are stored as vectors with the same
- ! number of elements as arguments in the command line. For example,
- ! recovering the means of the variables is accomplished with the
- ! command:
- !
-
- RECOVER mean_values MEANS
-
- !
- ! Print out the vector.
- !
-
- PRINT mean_values
-
- !
- ! Now if we want to access the mean of "months_employed" ...
- !
-
- SET mean_months_employed = mean_values(2)
- PRINT mean_months_employed
-
- !
- ! As another example, recover the vector of minimum values
- !
-
- RECOVER minimum_values MIN
-
- !
- ! Currently, deciles and quartiles are stored as vectors, meaning
- ! that SORITEC Sampler only retains decile and quartile values of
- ! the last variable in the argument list. Therefore, after
- ! executing the command:
- !
-
- RECOVER salary_quartiles QUARTIL
-
- !
- ! "quartile_values" contains statistics on the variable "salary".
- !
-
- PRINT salary_quartiles
-
- !
- ! Note that the same results would be produced by the command:
- !
-
- PRINT ^QUARTIL
-
- !
- ! We'll recover the quartile values for "months_employed" for the
- ! crosstabulation analysis below. To retrieve the data, we must
- ! re-execute the SYNOPSIS command with "months_employed" as the
- ! last argument.
- !
-
- SYNOPSIS months_employed
- RECOVER months_employed_quartiles QUARTIL
-
- !
- ! That's about it for the SYNOPSIS command.
- !
- cls
- !*******************************************************************!
- ! !
- ! Crosstabulation Analysis !
- ! (Section 8.2) !
- ! !
- !*******************************************************************!
- !
- ! Crosstabulation tables are generated by the XTAB command in
- ! SORITEC Sampler. Only a two-way table may be generated, meaning
- ! that only two arguments may follow the command name. The
- ! data series must be DISCRETE data. We'll show you later in this
- ! demo how to use the RECODE command to transform continuous to
- ! discrete data.
- !
- ! Suppose we want to crosstabulate job classification with education.
- ! Both of these variables consist of discrete data with the following
- ! meanings:
- !
- ! Job Classification: (1) clerical
- ! (2) technical
- ! (3) managerial
- !
- ! Education: (1) non-high school graduate
- ! (2) high school graduate
- ! (3) post-high school education
- ! (4) 2-year college degree
- ! (5) 4-year college degree
- !
- ! A crosstabulated table is produced by the command:
- !
-
- XTAB job_classification education
-
- !
- ! Note that the first argument is the row variable and the second is the
- ! column variable.
- !
- ! Recoverable internal SORITEC variables are:
- ! NROW - the number of rows in the table
- ! NCOL - number of columns
- ! RMARGIN - vector containing row margin values
- ! CMARGIN - vector containing column values
- ! XTABLE - matrix of the inner table
- !
- ! Note that the inner table is stored only when the NOMATS option
- ! is OFF (the default).
- !
- ! The inner table can be printed directly by the command.
- !
-
- PRINT ^XTABLE
-
- !
- ! When data series are continuous, they must first be transformed into
- ! discrete values before they can be crosstabulated. This is done
- ! in SORITEC Sampler with the RECODE command (Section 6.3).
- ! With the RECODE command, the arguments in the command line are
- ! the output data series, the input data series and the interval
- ! boundaries over which you want the data categorized, ranging from
- ! minimum to maximum values. Interval boundaries may be numbers,
- ! CONSTANTS or PARAMETERS.
- !
- ! For example, suppose we want to crosstabulate sex versus salary,
- ! and months employed versus salary. We'll use the quartiles and
- ! minimums we saved from the SYNOPSIS command to define the quantiles
- ! and automate the procedure as much as possible.
- !
- ! The first crosstabulation will be sex versus salary.
- !
- ! First, recover the current USE period and store it for later use
- ! in the variable "use_period"
- !
-
- RECOVER use_period USE
-
- !
- ! Now define the quantiles, which we will call p1, p2, p3, p4 and p5
- ! as the GROUP "pgroup".
- !
-
- GROUP pgroup p1 p2 p3 p4 p5
-
- !
- ! Assign the minimum interval boundary, p1, to be the minimum salary
- ! of the sample, which we saved in the VECTOR "minimum_values".
- ! The minimum salary is the 3rd element of this vector.
- !
-
- SET p1 = minimum_values(3)
-
- !
- ! Since the interval boundaries in the RECODE command must be
- ! numbers, CONSTANTS or PARAMETERS, we can either extract individual
- ! elements from the vector of quartiles we will use for the other
- ! interval boundaries or explicitly reference each vector element
- ! in the RECODE argument list. In the first crosstabulation, we'll
- ! extract the individual elements of the vector and assign them to
- ! CONSTANTS to demonstrate some of the tricks you can use when
- ! applying SORITEC Sampler. In the second example, we'll take a
- ! "short-cut" approach.
- !
- ! In SORITEC, extraction of individual elements from data series
- ! is facilitated by the GETOBS command. This command is not
- ! supported by SORITEC Sampler, however. We can still extract
- ! individual elements and assign them to CONSTANTS with the
- ! following set of commands.
- !
- ! First, change the USE period to the length of the vector of quartiles.
- !
-
- USE 1 4
-
- !
- ! Then transform the vector into a data series.
- !
-
- temp = salary_quartiles
-
- !
- ! Lastly, set up a DOT-loop that sequentially assigns each element
- ! of the data series to the remaining interval boundaries for the
- ! RECODE command.
- !
- !
- ! Initialize an index counter for accessing each element of the
- ! temporary data series, "temp"
- !
-
- SET j = 0
-
- DOT p2 p3 p4 p5
-
- SET j = j + 1
-
- !
- ! Reset the USE period to each element of the data series.
- !
-
- USE j
-
- !
- ! Assign an element of the data series to an interval boundary.
- !
-
- SET : = temp
-
- ENDDOT
-
- !
- ! Print out the group to insure the correct values have been assinged
- ! to the interval boundaries. (Refer to the interval boundaries as a
- ! group.)
- !
-
- ON GROUP
-
- PRINT pgroup
-
- !
- ! Now restore the original USE period which we previously recovered.
- !
-
- USE use_period
-
- !
- ! Check to be sure the original USE period has been restored.
- !
-
- USE
-
- ! And use the RECODE command to transform the continuous data series,
- ! "salary" to the discrete series "salary_levels".
- !
-
- RECODE salary_levels salary pgroup
-
- !
- ! Print out both variables.
- !
-
- PRINT salary salary_levels
-
- !
- ! Finally, crosstabulate sex versus salary_levels.
- ! Sex is defined 0 = female and 1 = male in the data.
- !
-
- XTAB sex salary_levels
-
- !
- ! Repeat the procedure for months employed versus salary.
- ! This time, however, reference the relevant elements of the
- ! vectors "Minimum_values" and "months_employed_quartiles"
- ! directly into the RECODE command line.
- !
-
- RECODE months_employed_intervals months_employed minimum_values(2) &
- months_employed_quartiles(1) months_employed_quartiles(2) &
- months_employed_quartiles(3) months_employed_quartiles(4)
-
- !
- ! Not as elegant, but it does the trick! See?
- !
-
- PRINT months_employed months_employed_intervals
-
- !
- ! Now generate the crosstabulation table.
- !
-
- XTAB months_employed_intervals salary_levels
-
- !
- ! That's it!
- !
- cls
- QUIT
-